Data Mining in Construction Wiki
What is Data Mining? Data mining is the collection information and organizing it in a way that can be understood and interpreted. By doing so, patterns can be seen and allow us to make predictions on future events. How Can Data Mining Be Utilized? Data mining is used in many everyday things that we may not even expect. Just using the Internet makes us subjects of data mining. Google uses your search history and how many times you visited a website to decide what advertisements to put on the webpages you visit. Supermarkets also have ways to use data mining. If you sign up for a membership card, you are subject to data mining. They store all the information into a big database and try to find patterns between you and other shoppers. For example, the supermarket will be able to store information about the types of purchases the customer makes and will be able enhance everyone’s shopping experience. They also monitor what products are being bought and how fast. This allows them to make sure they have enough inventory to meet the demands of the customers. The construction engineering industry can also utilize data mining. A great use for data mining in construction is to estimate if a project will have an overrun, under run, or come out even. By using the project description, line items, and the low bid for the project, a model can be made to determine the likelihood of the project having an overrun. Data Preparation The first step in data mining is collecting data that would pertinent to the topic you are researching. After the data is collected, it needs to be organized in a way that it can be utilized effectively. In the example above about project overrun, the data was organized in an Excel file. Data mining will produce the best results when the data is well organized. Missing data is a very common problem in data mining. A missing value can signify different things in your data. Maybe the data was not available or not applicable, or sometimes the event did not happen. There are some data mining methods to treat missing values. In general these methods ignore the missing values or infer them from existing values. Another problem in data mining is the inconsistent data. Sometimes the data collection instruments used may be faulty, or there may have been computer or human errors at the time of data entry. The inconsistent data from our models or common sense should be deleted. In some cases, we can obtain so much data that it is necessary to realize a data reduction. There are methods to simplify the information contained in large datasets into a smaller volume, which produces the same or similar analytical results. Another important step when preparing the data is figuring out the scope of the research. In the construction research example, the aim is to find out how likely construction projects are to have a cost overrun. The idea is to find a correlation between the words used in the project description and the likelihood of an overrun. The project description and the five most expensive line items were used as the text in this example because they are the most relevant and help find the best results. Software Used in Data Mining After the data is collected and organized in excel, it can be transferred into different programs that will interpret the data. There are a lot of free programs that can be used such as, GATE and KNIME. However, the most popular and powerful free software is RapidMiner. There is also software that can be purchased such as SAS Enterprise Miner and IBM SPSS Modeler. For the examples in this wiki, we will be using RapidMiner. 'Modeling' Modeling is the most important part in the data mining process. One must create a good model in order to get the results that they want. There are two types of modeling, prediction and classification. In the engineering field, we will mostly be using the predictive type of modeling. We can use a predictive model to figure out if a project will have a cost overrun. This will help us figure out if a company should risk bidding on the project or not. Category:Browse